AITopics | consistent weighted sampling

Collaborating Authors

consistent weighted sampling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Re-randomized Densification for One Permutation Hashing and Bin-wise Consistent Weighted Sampling

Ping Li, Xiaoyun Li, Cun-Hui Zhang

Neural Information Processing SystemsOct-3-2025, 08:13:22 GMT

Jaccard similarity to be practical in large-scale settings.

international conference, non-empty bin, proceedings, (13 more...)

Neural Information Processing Systems

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
(12 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Reviews: Re-randomized Densification for One Permutation Hashing and Bin-wise Consistent Weighted Sampling

Neural Information Processing SystemsJun-1-2025, 06:03:51 GMT

The authors propose that the optimal densification for OPH can actually be further optimized. In usual OPH, we get one permutation of the sparse vector, break the vector into K equal sized bins. In the usual Consistent Weighted Sampling (CWS) approach, we sample non-empty bins from these K bins and retrieve a fixed hash code for these bins. In this new approach, the authors suggest to treat each of the K bins as a separate sparse vector and perform MinHash on these retrieved bins to get a hash code instead of directly getting a Hash code. The authors theoretically prove that this re-randomization achieves the smallest variance among densification schemes(that are used to retrieve hash codes from empty buckets). Also, they extend this idea to weighted non-negative sparse vectors (by a method called Bin-wise CWS) The paper seems to be a subtle improvement over prior work.

bin-wise consistent weighted sampling, consistent weighted sampling, re-randomized densification, (4 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.41)

Technology:

Information Technology > Data Science > Data Mining (0.81)
Information Technology > Artificial Intelligence > Machine Learning (0.64)

Add feedback

Engineering a Simplified 0-Bit Consistent Weighted Sampling

Raff, Edward, Sylvester, Jared, Nicholas, Charles

arXiv.org Machine LearningMar-30-2018

The Min-Hashing approach to sketching has become an important tool in data analysis, search, and classification. To apply it to real-valued datasets, the ICWS algorithm has become a seminal approach that is widely used, and provides state-of-the-art performance for this problem space. However, ICWS suffers a computational burden as the sketch size K increases. We develop a new Simplified approach to the ICWS algorithm, that enables us to obtain over 20x speedups compared to the standard algorithm. The veracity of our approach is demonstrated empirically on multiple datasets, showing that our new Simplified CWS obtains the same quality of results while being an order of magnitude faster.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

1804.00069

Country:

North America > United States > New York (0.28)
North America > United States > Maryland (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback